Previsão da Evapotranspiração de Referência (ETo) ¶
Introdução¶
Evapotranspiração é o processo pelo qual uma lavoura "perde" água para atmosfera. Consiste na ocorrência dos processos de evaporação e transpiração simultaneamente. Enquanto a evaporação é a passagem da água do estado líquido para o estado gasoso (vapor) a partir de qualquer superfície (lagos, rios, pavimentos, solo e área de vegetação), a transpiração é o processo de "perda" de água que ocorre a partir das folhas, flores, caules e raízes de uma vegetação.
Os principais fatores que afetam o processo de evapotranspiração são:
- parâmetros climáticos (radiação, temperatura do ar, umidade relativa do ar, velocidade do vento);
- fatores relacionados ao tipo de cultivo (variedade da planta, estágio de desenvolvimento, cobertura do solo, características das raízes);
- condições ambientais e de manejo do solo (salinidade do solo, fertilidade do solo, limitações no uso de fertilizantes, ausência de controle de doenças);
Fig. 1 - Evapotranspiração, de acordo com os fatores que afetam seu cálculo.
Fig. 2 - Relação entre as Evapotranspirações.
Fig. 3 - Equação FAO* Penman-Monteith para cálculo da ETo.
*Food and Agriculture Organization of The United Nations (No Brasil, desde 1949.)
onde:
- ETo - reference evapotranspiration [mm.day-1]
- Rn - net radiation at the crop surface [MJ.m-2.day-1]
- G - soil heat flux density [MJ.m-2/day]
- T - mean daily air temperatura at 2 m height [C]
- u2 - wind speed at 2 m height [m.s-1]
- es - saturation vapour pressure [kPa]
- ea - actual vapour pressure [kPa]
- (es - ea) - saturation vapour pressure deficit [kPa]
- delta - slope vapour pressure curve [kPa.C-1]
- gama - psychrometric constant [kPa.C-1]
Vantagens:
- Referência para comparação da ET para diferentes períodos do ano e para diferentes regiões e diferentes culturas;
- O cálculo só depende de dados meteorológicos;
- Históricos permitem realizar previsões;
- Auxilia no planejamento e manejo da irrigação;
Desvantagens:
- Dependente de dados meteorológicos;
Ref.:
Bibliotecas necessárias:¶
import os
import numpy as np
from numpy import array
from numpy import concatenate
import pandas as pd
from pandas import concat
import matplotlib.pyplot as plt
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 15, 7
from math import sqrt
from datetime import datetime
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_percentage_error
from sklearn.metrics import mean_absolute_error
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
import statsmodels.tsa.api as smt
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.stats.diagnostic import acorr_ljungbox
from tqdm import tqdm_notebook
from typing import Union
from itertools import product
import pmdarima as pm
from pmdarima.model_selection import train_test_split
from pmdarima.arima import auto_arima
import seaborn as sns
from scipy.stats import chi2
# import geobr
# from geopy.geocoders import Nominatim
import warnings
warnings.filterwarnings('ignore')
Definição de funções auxiliares:¶
def teste_estacionariedade(X, cutoff=0.01):
# O valor de cutoff, aqui definido como 0.01, serve como ponto de corte que, abaixo dele, sugere estacionariedade
pvalue = adfuller(X)[1]
if pvalue < cutoff:
print('valor-p = ' + str(pvalue) + '. A série ' + X.name +' parece ser estacionária.')
return True
else:
print('valor-p = ' + str(pvalue) + '. A série ' + X.name +' parece ser não-estacionária')
return False
def split_sequence(sequence, n_lags):
X, y = list(), list()
for i in range(len(sequence)):
end_ix = i + n_lags
# Testa se a sequência chegou ao fim
if end_ix > len(sequence)-1:
break
# cria os trechos x e y de cada amostra
seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)
def find_coordinates(file_name):
lat = ''
lon = ''
previous_coord = file_name.split(sep='_')
for item in previous_coord:
if item.startswith('lat'):
lat = float(item.split(sep='lat')[1])
else:
lon = float(item.split(sep='lon')[1])
print(f'Coordinates of {file_name}: ({lat}, {lon})')
location = geolocator.geocode(str(lat)+","+str(lon))
print(f'\tLocation: {location}')
def return_coordinates(file_name, lat, lon):
# lat = ''
# lon = ''
previous_coord = file_name.split(sep='_')
i = 0
for item in previous_coord:
if item.startswith('lat'):
lat = float(item.split(sep='lat')[1])
else:
lon = float(item.split(sep='lon')[1])
location = geolocator.geocode(str(lat)+","+str(lon))
# print(f'{lat}, {lon}')
return [lat, lon]
def normalize(df):
mindf = df.min()
maxdf = df.max()
return (df-mindf)/(maxdf-mindf)
def denormalize(norm, _min, _max):
return [(n * (_max-_min)) + _min for n in norm]
Escolha de uma das 152 bases de ETo disponíveis:¶
origin = "datasets/only-clima/"
geolocator = Nominatim(user_agent='geoapiFSAF')
file_names = []
for path, subdirectory, files in os.walk(origin):
for name in files:
file_names.append(name.split(sep='.csv')[0])
len(file_names)
152
latitudes = []
longitudes = []
for fn in file_names:
# find_coordinates(fn)
# light_find_coordinates(fn)
lat, lon = return_coordinates(fn,latitudes, longitudes)
latitudes.append(lat)
longitudes.append(lon)
plt.figure(figsize=(10, 8))
plt.scatter(longitudes, latitudes, marker='.')
<matplotlib.collections.PathCollection at 0x1c80e11e110>
A base escolhida foi a de coordenadas geográficas com latitude -19.75 e longitude -44.45, correspondente à localidade de Pará de Minas/MG, ponto mais próximo da estação meteorológica de Sete Lagoas/MG (lat. -19.46, lon. -44.25), objeto de estudo nos trabalhos listados a seguir:
- LUCAS, Patrícia de Oliveira e. Previsão de Séries Temporais de Evapotranspiração de Referência com Redes Neurais Convolucionais
- Patrícia de Oliveira e Lucas, Marcos Antonio Alves, Petrônio Cândido de Lima e Silva, Frederico Gadelha Guimarães. Reference evapotranspiration time series forecasting with ensemble of convolutional neural networks
- Patrícia de Oliveira e Lucas et al. REFERENCE EVAPOTRANSPIRATION PREDICTION FOR PRECISION AGRICULTURE USING FUZZY TIME SERIES.
Importando a base de dados escolhida.
df_target = pd.read_csv('datasets/clima/lat-19.75_lon-44.45.csv', parse_dates=True, index_col='Unnamed: 0')
df_target = df_target.round(decimals=1)
df_target.fillna(0, inplace=True)
df_target.head()
| Rs | u2 | Tmax | Tmin | RH | pr | ETo | |
|---|---|---|---|---|---|---|---|
| 2000-01-01 | 10.5 | 1.0 | 23.2 | 18.9 | 93.7 | 22.2 | 2.2 |
| 2000-01-02 | 11.1 | 1.4 | 25.2 | 19.4 | 90.1 | 24.8 | 2.4 |
| 2000-01-03 | 11.5 | 1.6 | 26.4 | 19.0 | 86.4 | 30.2 | 2.6 |
| 2000-01-04 | 12.3 | 1.3 | 27.6 | 19.0 | 84.2 | 34.7 | 2.8 |
| 2000-01-05 | 21.9 | 1.0 | 31.0 | 19.4 | 73.8 | 9.9 | 4.7 |
df_target.tail()
| Rs | u2 | Tmax | Tmin | RH | pr | ETo | |
|---|---|---|---|---|---|---|---|
| 2019-12-27 | 25.5 | 2.0 | 31.0 | 18.3 | 65.7 | 0.0 | 5.8 |
| 2019-12-28 | 22.6 | 1.6 | 31.0 | 18.0 | 67.5 | 0.0 | 5.1 |
| 2019-12-29 | 23.1 | 1.4 | 31.4 | 18.1 | 70.9 | 0.0 | 5.1 |
| 2019-12-30 | 25.1 | 1.7 | 31.1 | 17.2 | 64.9 | 0.0 | 5.6 |
| 2019-12-31 | 26.8 | 1.9 | 32.1 | 16.7 | 62.6 | 0.0 | 6.1 |
Plotando a série de ETo.
df_target['ETo'].plot(title='Dados de Pará de Minas/MG')
plt.legend(loc='best')
plt.show()
Confirmando a presença de sazonalidade.
from pandas.plotting import autocorrelation_plot
autocorrelation_plot(df_target['ETo'])
plt.show()
Analisando os componentes das série.
# Análise preliminar dos componentes da série: tendência, sazonalidade e resíduos.
decomposicao = seasonal_decompose(df_target['ETo'], period=365)
tendencia = decomposicao.trend
sazonalidade = decomposicao.seasonal
residuo = decomposicao.resid
plt.subplot(411)
plt.plot(df_target['ETo'], label='Original')
plt.legend(loc='best')
plt.subplot(412)
plt.plot(tendencia, label='Tendência')
plt.legend(loc='best')
plt.subplot(413)
plt.plot(sazonalidade,label='Sazonalidade')
plt.legend(loc='best')
plt.subplot(414)
plt.plot(residuo, label='Resíduos')
plt.legend(loc='best')
plt.tight_layout()
plt.show()
Confirmando o período de sazonalidade: semestral.
decomposition=seasonal_decompose(df_target['ETo'], model='additive', period=183)
decomposition.plot()
plt.show()
Verificando a estacionariedade da série.
teste_estacionariedade(df_target['ETo'])
valor-p = 2.797460563880368e-15. A série ETo parece ser estacionária.
True
dftest = adfuller(df_target['ETo'], autolag = 'AIC')
print("1. ADF : ", dftest[0])
print("2. P-Value : ", dftest[1])
print("3. Num Of Lags : ", dftest[2])
print("4. Num Of Observations Used For ADF Regression and Critical Values Calculation :", dftest[3])
print("5. Critical Values :")
for key, val in dftest[4].items():
print("\t",key, ": ", val)
1. ADF : -9.144138505074833 2. P-Value : 2.797460563880368e-15 3. Num Of Lags : 19 4. Num Of Observations Used For ADF Regression and Critical Values Calculation : 7285 5. Critical Values : 1% : -3.4312479554815933 5% : -2.861936826623019 10% : -2.5669812265733456
fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1, sharex=True)
smt.graphics.plot_acf(df_target['ETo'], ax=ax1, lags=20, alpha=0.05, title="Autocorrelação da Série de ETo")
smt.graphics.plot_pacf(df_target['ETo'], ax=ax2, lags=20, alpha=0.05, title="Autocorrelação Parcial da Série de ETo")
plt.show()
Pelos correlogramas, a série parece representar um processo autogressivo de ordem 1, AR(1). No entanto, vamos testar usar pmdarima.arima.auto_arima para tentar obter os melhores hiperparâmetros para o modelo.
# Rodando o auto_arima para os primeiros 365 dias da série (2000).
stepwise = auto_arima(df_target['ETo'][:365],
start_p=0,
start_q=0,
d=0,
max_p=1,
max_q=1,
start_P=0,
start_Q=0,
D=0,
max_P=1, max_Q=1, max_D=1, max_order=1,
m=183,
seasonal=True,
trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True, maxiter=3)
Performing stepwise search to minimize aic ARIMA(0,0,0)(0,0,0)[183] intercept : AIC=1088.071, Time=0.14 sec ARIMA(1,0,0)(1,0,0)[183] intercept : AIC=inf, Time=22.88 sec ARIMA(0,0,1)(0,0,1)[183] intercept : AIC=inf, Time=21.64 sec ARIMA(0,0,0)(0,0,0)[183] : AIC=2022.136, Time=0.01 sec ARIMA(0,0,0)(1,0,0)[183] intercept : AIC=inf, Time=18.71 sec ARIMA(0,0,0)(0,0,1)[183] intercept : AIC=inf, Time=31.81 sec ARIMA(0,0,0)(1,0,1)[183] intercept : AIC=inf, Time=36.61 sec ARIMA(1,0,0)(0,0,0)[183] intercept : AIC=752.189, Time=0.04 sec ARIMA(1,0,0)(0,0,1)[183] intercept : AIC=751.513, Time=21.12 sec ARIMA(1,0,0)(1,0,1)[183] intercept : AIC=753.552, Time=28.05 sec ARIMA(1,0,1)(0,0,1)[183] intercept : AIC=753.262, Time=25.35 sec ARIMA(1,0,0)(0,0,1)[183] : AIC=792.227, Time=20.93 sec Best model: ARIMA(1,0,0)(0,0,1)[183] intercept Total fit time: 231.891 seconds
# Rodando o auto_arima para o segundo ano da série (2001).
stepwise = auto_arima(df_target['ETo'][365:730],
start_p=0,
start_q=0,
d=0,
max_p=1,
max_q=1,
start_P=0,
start_Q=0,
D=0,
max_P=1, max_Q=1, max_D=1, max_order=1,
m=183,
seasonal=True,
trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True, maxiter=3)
Performing stepwise search to minimize aic ARIMA(0,0,0)(0,0,0)[183] intercept : AIC=1076.032, Time=0.12 sec ARIMA(1,0,0)(1,0,0)[183] intercept : AIC=inf, Time=26.16 sec ARIMA(0,0,1)(0,0,1)[183] intercept : AIC=894.816, Time=23.54 sec ARIMA(0,0,0)(0,0,0)[183] : AIC=2044.827, Time=0.01 sec ARIMA(0,0,1)(0,0,0)[183] intercept : AIC=894.833, Time=0.09 sec ARIMA(0,0,1)(1,0,1)[183] intercept : AIC=898.971, Time=28.87 sec ARIMA(0,0,1)(1,0,0)[183] intercept : AIC=1843.685, Time=23.25 sec ARIMA(0,0,0)(0,0,1)[183] intercept : AIC=inf, Time=22.35 sec ARIMA(1,0,1)(0,0,1)[183] intercept : AIC=793.837, Time=32.09 sec ARIMA(1,0,1)(0,0,0)[183] intercept : AIC=791.840, Time=0.05 sec ARIMA(1,0,1)(1,0,0)[183] intercept : AIC=1152.008, Time=28.00 sec ARIMA(1,0,1)(1,0,1)[183] intercept : AIC=795.838, Time=33.11 sec ARIMA(1,0,0)(0,0,0)[183] intercept : AIC=789.956, Time=0.08 sec ARIMA(1,0,0)(0,0,1)[183] intercept : AIC=791.953, Time=41.40 sec ARIMA(1,0,0)(1,0,1)[183] intercept : AIC=793.953, Time=61.31 sec ARIMA(1,0,0)(0,0,0)[183] : AIC=836.858, Time=4.75 sec Best model: ARIMA(1,0,0)(0,0,0)[183] intercept Total fit time: 329.653 seconds
# Rodando o auto_arima para o segundo ano da série (2004).
stepwise = auto_arima(df_target['ETo'][1460:1825],
start_p=0,
start_q=0,
d=0,
max_p=1,
max_q=1,
start_P=0,
start_Q=0,
D=0,
max_P=1, max_Q=1, max_D=1, max_order=1,
m=183,
seasonal=True,
trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True, maxiter=3)
Performing stepwise search to minimize aic ARIMA(0,0,0)(0,0,0)[183] intercept : AIC=1129.985, Time=0.03 sec ARIMA(1,0,0)(1,0,0)[183] intercept : AIC=758.524, Time=55.25 sec ARIMA(0,0,1)(0,0,1)[183] intercept : AIC=900.992, Time=48.65 sec ARIMA(0,0,0)(0,0,0)[183] : AIC=2020.728, Time=0.03 sec ARIMA(1,0,0)(0,0,0)[183] intercept : AIC=756.542, Time=0.04 sec ARIMA(1,0,0)(0,0,1)[183] intercept : AIC=758.523, Time=26.56 sec ARIMA(1,0,0)(1,0,1)[183] intercept : AIC=760.518, Time=28.04 sec ARIMA(1,0,1)(0,0,0)[183] intercept : AIC=757.391, Time=0.05 sec ARIMA(0,0,1)(0,0,0)[183] intercept : AIC=903.256, Time=0.04 sec ARIMA(1,0,0)(0,0,0)[183] : AIC=792.457, Time=0.03 sec Best model: ARIMA(1,0,0)(0,0,0)[183] intercept Total fit time: 158.974 seconds
# Rodando o auto_arima para o segundo ano da série (2006).
stepwise = auto_arima(df_target['ETo'][2190:2555],
start_p=0,
start_q=0,
d=0,
max_p=1,
max_q=1,
start_P=0,
start_Q=0,
D=0,
max_P=1, max_Q=1, max_D=1, max_order=1,
m=183,
seasonal=True,
trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True, maxiter=3)
Performing stepwise search to minimize aic ARIMA(0,0,0)(0,0,0)[183] intercept : AIC=1082.285, Time=0.03 sec ARIMA(1,0,0)(1,0,0)[183] intercept : AIC=inf, Time=26.11 sec ARIMA(0,0,1)(0,0,1)[183] intercept : AIC=887.915, Time=21.99 sec ARIMA(0,0,0)(0,0,0)[183] : AIC=2010.069, Time=0.01 sec ARIMA(0,0,1)(0,0,0)[183] intercept : AIC=890.569, Time=0.04 sec ARIMA(0,0,1)(1,0,1)[183] intercept : AIC=891.581, Time=29.70 sec ARIMA(0,0,1)(1,0,0)[183] intercept : AIC=1951.420, Time=21.18 sec ARIMA(0,0,0)(0,0,1)[183] intercept : AIC=inf, Time=19.54 sec ARIMA(1,0,1)(0,0,1)[183] intercept : AIC=757.277, Time=29.23 sec ARIMA(1,0,1)(0,0,0)[183] intercept : AIC=755.240, Time=0.06 sec ARIMA(1,0,1)(1,0,0)[183] intercept : AIC=1065.477, Time=33.99 sec ARIMA(1,0,1)(1,0,1)[183] intercept : AIC=759.287, Time=35.75 sec ARIMA(1,0,0)(0,0,0)[183] intercept : AIC=754.425, Time=0.04 sec ARIMA(1,0,0)(0,0,1)[183] intercept : AIC=756.387, Time=26.15 sec ARIMA(1,0,0)(1,0,1)[183] intercept : AIC=758.390, Time=43.41 sec ARIMA(1,0,0)(0,0,0)[183] : AIC=796.675, Time=0.06 sec Best model: ARIMA(1,0,0)(0,0,0)[183] intercept Total fit time: 288.081 seconds
# Rodando o auto_arima para o segundo ano da série (2018).
stepwise = auto_arima(df_target['ETo'][6570:6935],
start_p=0,
start_q=0,
d=0,
max_p=1,
max_q=1,
start_P=0,
start_Q=0,
D=0,
max_P=1, max_Q=1, max_D=1, max_order=1,
m=183,
seasonal=True,
trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True, maxiter=3)
Performing stepwise search to minimize aic ARIMA(0,0,0)(0,0,0)[183] intercept : AIC=1140.097, Time=0.08 sec ARIMA(1,0,0)(1,0,0)[183] intercept : AIC=inf, Time=26.66 sec ARIMA(0,0,1)(0,0,1)[183] intercept : AIC=982.874, Time=23.38 sec ARIMA(0,0,0)(0,0,0)[183] : AIC=2048.352, Time=0.04 sec ARIMA(0,0,1)(0,0,0)[183] intercept : AIC=982.628, Time=0.04 sec ARIMA(0,0,1)(1,0,0)[183] intercept : AIC=3744.270, Time=21.50 sec ARIMA(0,0,1)(1,0,1)[183] intercept : AIC=985.317, Time=31.96 sec ARIMA(1,0,1)(0,0,0)[183] intercept : AIC=867.874, Time=0.07 sec ARIMA(1,0,1)(1,0,0)[183] intercept : AIC=1405.738, Time=28.00 sec ARIMA(1,0,1)(0,0,1)[183] intercept : AIC=869.622, Time=26.59 sec ARIMA(1,0,1)(1,0,1)[183] intercept : AIC=871.700, Time=33.25 sec ARIMA(1,0,0)(0,0,0)[183] intercept : AIC=870.634, Time=0.04 sec ARIMA(1,0,1)(0,0,0)[183] : AIC=897.641, Time=0.03 sec Best model: ARIMA(1,0,1)(0,0,0)[183] intercept Total fit time: 192.134 seconds
# Rodando o auto_arima para o segundo ano da série (2019).
stepwise = auto_arima(df_target['ETo'][6940:7305],
start_p=0,
start_q=0,
d=0,
max_p=1,
max_q=1,
start_P=0,
start_Q=0,
D=0,
max_P=1, max_Q=1, max_D=1, max_order=1,
m=183,
seasonal=True,
trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True, maxiter=3)
Performing stepwise search to minimize aic ARIMA(0,0,0)(0,0,0)[183] intercept : AIC=1203.577, Time=0.19 sec ARIMA(1,0,0)(1,0,0)[183] intercept : AIC=inf, Time=21.01 sec ARIMA(0,0,1)(0,0,1)[183] intercept : AIC=inf, Time=22.72 sec ARIMA(0,0,0)(0,0,0)[183] : AIC=2099.244, Time=0.02 sec ARIMA(0,0,0)(1,0,0)[183] intercept : AIC=inf, Time=17.07 sec ARIMA(0,0,0)(0,0,1)[183] intercept : AIC=inf, Time=18.30 sec ARIMA(0,0,0)(1,0,1)[183] intercept : AIC=inf, Time=25.63 sec ARIMA(1,0,0)(0,0,0)[183] intercept : AIC=847.188, Time=0.07 sec ARIMA(1,0,0)(0,0,1)[183] intercept : AIC=849.187, Time=24.97 sec ARIMA(1,0,0)(1,0,1)[183] intercept : AIC=851.187, Time=32.13 sec ARIMA(1,0,1)(0,0,0)[183] intercept : AIC=839.578, Time=0.05 sec ARIMA(1,0,1)(1,0,0)[183] intercept : AIC=1356.271, Time=38.88 sec ARIMA(1,0,1)(0,0,1)[183] intercept : AIC=841.572, Time=27.10 sec ARIMA(1,0,1)(1,0,1)[183] intercept : AIC=843.610, Time=47.15 sec ARIMA(0,0,1)(0,0,0)[183] intercept : AIC=1008.495, Time=0.81 sec ARIMA(1,0,1)(0,0,0)[183] : AIC=856.753, Time=0.35 sec Best model: ARIMA(1,0,1)(0,0,0)[183] intercept Total fit time: 277.221 seconds
# Normalize Data
# Save Min-Max for Denorm
min_raw = df_target.min()
max_raw = df_target.max()
# Perform Normalization
norm_df = normalize(df_target)
df = df_target
# Modelo ARIMA.
# model = ARIMA(df_target['ETo'], order=(1,0,0))
model = ARIMA(norm_df['ETo'], order=(1,0,0))
model_fit = model.fit()
print(model_fit.summary())
# Separando a base de dados em treinamento e teste, na proporção de 80% para 20%.
# X = df_target['ETo'].values
X = norm_df['ETo'].values
size = int(len(X) * 0.80)
train, test = X[0:size], X[size:len(X)]
SARIMAX Results
==============================================================================
Dep. Variable: ETo No. Observations: 7305
Model: ARIMA(1, 0, 0) Log Likelihood 5157.796
Date: Mon, 11 Dec 2023 AIC -10309.592
Time: 22:11:59 BIC -10288.903
Sample: 01-01-2000 HQIC -10302.478
- 12-31-2019
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 0.4826 0.007 72.618 0.000 0.470 0.496
ar.L1 0.7889 0.008 93.116 0.000 0.772 0.806
sigma2 0.0143 0.000 70.864 0.000 0.014 0.015
===================================================================================
Ljung-Box (L1) (Q): 5.39 Jarque-Bera (JB): 970.04
Prob(Q): 0.02 Prob(JB): 0.00
Heteroskedasticity (H): 1.27 Skew: -0.14
Prob(H) (two-sided): 0.00 Kurtosis: 4.76
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
# Aplicando a previsão aos dados, usando rolling forecasting.
history = [x for x in train]
predictions = []
n_prints = 3
for t in range(len(test)):
model = ARIMA(history, order=(1,0,0))
model_fit = model.fit()
output = model_fit.forecast()
yhat = output[0]
predictions.append(yhat)
obs = test[t]
history.append(obs)
if t < 3 or t >= (len(test) - 3):
print('previsão=%f, osbervado=%f' % (yhat, obs))
previsão=0.680182, osbervado=0.716667 previsão=0.667001, osbervado=0.433333 ... previsão=0.442556, osbervado=0.316667 previsão=0.640897, osbervado=0.766667 previsão=0.706656, osbervado=0.850000
# Medindo o desempenho do modelo a partir das métricas: MAE, RMSE e MAPE
mae_ar = mean_absolute_error(test, predictions)
print('Test MAE: %.3f' % mae_ar)
rmse_ar = sqrt(mean_squared_error(test, predictions))
print('Test RMSE: %.3f' % rmse_ar)
mape_ar = mean_absolute_percentage_error(test, predictions)
print('Test MAPE: %.3f' % mape_ar)
plt.title("ETo de Pará de Minas/MG")
plt.plot(test, label='Test')
plt.plot(predictions, color='red', label='Prediction')
plt.legend(loc='best')
plt.show()
Test MAE: 0.091 Test RMSE: 0.126 Test MAPE: 0.259
# Ilustrando graficamente o desempenho do modelo ARIMA.
fig, ax = plt.subplots()
ax.plot(df_target['ETo'].index[0:size], train, 'g-.', label='Train')
ax.plot(df_target['ETo'].index[size:len(X)], test, 'b-', label='Test')
ax.plot(df_target['ETo'].index[size:len(X)], predictions, 'r--', label='Predicted')
ax.legend(loc='best')
ax.set_xlabel('Year')
ax.set_ylabel('Value')
# ax.axvline(df_target['ETo'].index[size], color='#808080', linestyle='-', alpha=0.2)
# ax.axvline(df_target['ETo'].index[len(X) - 1], color='#808080', linestyle='-', alpha=0.2)
ax.axvspan(df_target['ETo'].index[size], df_target['ETo'].index[len(X) - 1], color='#808080', alpha=0.2) # zona cinza
plt.title("ETo de Pará de Minas/MG")
fig.autofmt_xdate()
plt.tight_layout()
Aqui começam as previsões multivariadas.¶
df_target.head()
| Rs | u2 | Tmax | Tmin | RH | pr | ETo | |
|---|---|---|---|---|---|---|---|
| 2000-01-01 | 10.5 | 1.0 | 23.2 | 18.9 | 93.7 | 22.2 | 2.2 |
| 2000-01-02 | 11.1 | 1.4 | 25.2 | 19.4 | 90.1 | 24.8 | 2.4 |
| 2000-01-03 | 11.5 | 1.6 | 26.4 | 19.0 | 86.4 | 30.2 | 2.6 |
| 2000-01-04 | 12.3 | 1.3 | 27.6 | 19.0 | 84.2 | 34.7 | 2.8 |
| 2000-01-05 | 21.9 | 1.0 | 31.0 | 19.4 | 73.8 | 9.9 | 4.7 |
Variáveis exógenas (preditores, variáveis de entrada):
- Rs - Solar radiation
- u2 - Wind speed
- Tmax - Maximum temperature
- Tmin - Minimum temperature
- RH - Relative humidity
- pr - Precipitation
Variável endógena (variável alvo, variável de interesse, variável de saída):
- ETo - Reference Evapotranspiration
df_target.plot(title='Dados de Pará de Minas/MG')
plt.legend(loc='best')
plt.show()
values = df_target.values
# specify columns to plot
groups = [0, 1, 2, 3, 4, 5, 6]
i = 1
# plot each column
plt.figure()
for group in groups:
plt.subplot(len(groups), 1, i)
plt.plot(values[:, group])
plt.title(df_target.columns[group], y=0.5, loc='right')
i += 1
plt.show()
Separando os dados da variável alvo, das variáveis exógenas.
target = df_target[['ETo']]
target.name = 'ETo'
target.head()
| ETo | |
|---|---|
| 2000-01-01 | 2.2 |
| 2000-01-02 | 2.4 |
| 2000-01-03 | 2.6 |
| 2000-01-04 | 2.8 |
| 2000-01-05 | 4.7 |
exog = df_target[['Rs', 'u2', 'Tmax', 'Tmin', 'RH', 'pr']]
exog.name = 'Exogenous Variables'
exog.head()
| Rs | u2 | Tmax | Tmin | RH | pr | |
|---|---|---|---|---|---|---|
| 2000-01-01 | 10.5 | 1.0 | 23.2 | 18.9 | 93.7 | 22.2 |
| 2000-01-02 | 11.1 | 1.4 | 25.2 | 19.4 | 90.1 | 24.8 |
| 2000-01-03 | 11.5 | 1.6 | 26.4 | 19.0 | 86.4 | 30.2 |
| 2000-01-04 | 12.3 | 1.3 | 27.6 | 19.0 | 84.2 | 34.7 |
| 2000-01-05 | 21.9 | 1.0 | 31.0 | 19.4 | 73.8 | 9.9 |
Testando a estacionariedade da série alvo.
teste_estacionariedade(target)
valor-p = 2.7974605638801155e-15. A série ETo parece ser estacionária.
True
from statsmodels.tsa.api import VAR
def normalize(df):
mindf = df.min()
maxdf = df.max()
return (df-mindf)/(maxdf-mindf)
def denormalize(norm, _min, _max):
return [(n * (_max-_min)) + _min for n in norm]
def df_derived_by_shift(df,lag=0,NON_DER=[]):
df = df.copy()
if not lag:
return df
cols ={}
for i in range(1,lag+1):
for x in list(df.columns):
if x not in NON_DER:
if not x in cols:
cols[x] = ['{}_{}'.format(x, i)]
else:
cols[x].append('{}_{}'.format(x, i))
for k,v in cols.items():
columns = v
dfn = pd.DataFrame(data=None, columns=columns, index=df.index)
i = 1
for c in columns:
dfn[c] = df[k].shift(periods=i)
i+=1
df = pd.concat([df, dfn], axis=1)#, join_axes=[df.index])
return df
# Normalize Data
# Save Min-Max for Denorm
min_raw = df_target.min()
max_raw = df_target.max()
# Perform Normalization
norm_df = normalize(df_target)
df = df_target
norm_df.head()
| Rs | u2 | Tmax | Tmin | RH | pr | ETo | |
|---|---|---|---|---|---|---|---|
| 2000-01-01 | 0.243542 | 0.25000 | 0.268868 | 0.816038 | 0.986175 | 0.205937 | 0.200000 |
| 2000-01-02 | 0.265683 | 0.37500 | 0.363208 | 0.839623 | 0.930876 | 0.230056 | 0.233333 |
| 2000-01-03 | 0.280443 | 0.43750 | 0.419811 | 0.820755 | 0.874040 | 0.280148 | 0.266667 |
| 2000-01-04 | 0.309963 | 0.34375 | 0.476415 | 0.820755 | 0.840246 | 0.321892 | 0.300000 |
| 2000-01-05 | 0.664207 | 0.25000 | 0.636792 | 0.839623 | 0.680492 | 0.091837 | 0.616667 |
# Separando a base de dados em treinamento e teste, na proporção de 80% para 20%.
# X = df_target['ETo'].values
size = int(len(norm_df) * 0.80)
train, test = norm_df[0:size], norm_df[size:len(norm_df)]
# Modelo VAR.
order = 3
model = VAR(train.values)
model_fit = model.fit(maxlags=order)
lag_order = model_fit.k_ar
print(f'lag_order = {lag_order}\n')
print(model_fit.summary())
lag_order = 3
Summary of Regression Results
==================================
Model: VAR
Method: OLS
Date: Mon, 11, Dec, 2023
Time: 22:30:55
--------------------------------------------------------------------
No. of Equations: 7.00000 BIC: -38.8687
Nobs: 5841.00 HQIC: -38.9834
Log likelihood: 56167.7 FPE: 1.10442e-17
AIC: -39.0446 Det(Omega_mle): 1.07574e-17
--------------------------------------------------------------------
Results for equation y1
========================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------
const 0.337793 0.029471 11.462 0.000
L1.y1 0.153082 0.067637 2.263 0.024
L1.y2 -0.140028 0.030479 -4.594 0.000
L1.y3 -0.288964 0.037112 -7.786 0.000
L1.y4 -0.124168 0.035764 -3.472 0.001
L1.y5 -0.076731 0.033016 -2.324 0.020
L1.y6 0.028181 0.026531 1.062 0.288
L1.y7 0.597346 0.093981 6.356 0.000
L2.y1 -0.010511 0.073283 -0.143 0.886
L2.y2 0.033535 0.034195 0.981 0.327
L2.y3 0.032007 0.041163 0.778 0.437
L2.y4 0.015724 0.039733 0.396 0.692
L2.y5 0.045582 0.039146 1.164 0.244
L2.y6 0.081578 0.026663 3.060 0.002
L2.y7 0.047763 0.104510 0.457 0.648
L3.y1 -0.025231 0.066502 -0.379 0.704
L3.y2 -0.039924 0.031466 -1.269 0.205
L3.y3 -0.007293 0.033934 -0.215 0.830
L3.y4 0.023225 0.033517 0.693 0.488
L3.y5 0.076131 0.034186 2.227 0.026
L3.y6 -0.033234 0.026103 -1.273 0.203
L3.y7 0.107963 0.094382 1.144 0.253
========================================================================
Results for equation y2
========================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------
const 0.280645 0.021634 12.972 0.000
L1.y1 -0.140443 0.049651 -2.829 0.005
L1.y2 0.469955 0.022374 21.005 0.000
L1.y3 -0.064513 0.027243 -2.368 0.018
L1.y4 0.034738 0.026253 1.323 0.186
L1.y5 -0.153016 0.024236 -6.313 0.000
L1.y6 0.003421 0.019476 0.176 0.861
L1.y7 0.155128 0.068990 2.249 0.025
L2.y1 -0.136538 0.053796 -2.538 0.011
L2.y2 -0.056137 0.025102 -2.236 0.025
L2.y3 0.052977 0.030217 1.753 0.080
L2.y4 -0.081352 0.029167 -2.789 0.005
L2.y5 0.059432 0.028736 2.068 0.039
L2.y6 0.004443 0.019573 0.227 0.820
L2.y7 0.123180 0.076719 1.606 0.108
L3.y1 -0.064224 0.048818 -1.316 0.188
L3.y2 0.080862 0.023099 3.501 0.000
L3.y3 0.004188 0.024910 0.168 0.866
L3.y4 -0.002234 0.024604 -0.091 0.928
L3.y5 -0.006005 0.025095 -0.239 0.811
L3.y6 0.019104 0.019162 0.997 0.319
L3.y7 0.052985 0.069284 0.765 0.444
========================================================================
Results for equation y3
========================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------
const 0.215614 0.015183 14.201 0.000
L1.y1 0.035070 0.034846 1.006 0.314
L1.y2 -0.262577 0.015702 -16.722 0.000
L1.y3 0.440181 0.019120 23.023 0.000
L1.y4 0.105685 0.018425 5.736 0.000
L1.y5 -0.108400 0.017010 -6.373 0.000
L1.y6 -0.000875 0.013668 -0.064 0.949
L1.y7 0.304156 0.048418 6.282 0.000
L2.y1 0.067322 0.037755 1.783 0.075
L2.y2 0.131608 0.017617 7.471 0.000
L2.y3 0.044007 0.021207 2.075 0.038
L2.y4 -0.014950 0.020470 -0.730 0.465
L2.y5 -0.001321 0.020168 -0.066 0.948
L2.y6 0.033876 0.013736 2.466 0.014
L2.y7 -0.199418 0.053843 -3.704 0.000
L3.y1 -0.094470 0.034261 -2.757 0.006
L3.y2 -0.001927 0.016211 -0.119 0.905
L3.y3 0.043795 0.017482 2.505 0.012
L3.y4 -0.021245 0.017268 -1.230 0.219
L3.y5 0.019848 0.017612 1.127 0.260
L3.y6 0.003339 0.013448 0.248 0.804
L3.y7 0.095805 0.048625 1.970 0.049
========================================================================
Results for equation y4
========================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------
const 0.090009 0.013845 6.501 0.000
L1.y1 -0.298909 0.031774 -9.407 0.000
L1.y2 -0.203510 0.014318 -14.214 0.000
L1.y3 0.195550 0.017434 11.217 0.000
L1.y4 0.602033 0.016801 35.834 0.000
L1.y5 0.128608 0.015510 8.292 0.000
L1.y6 0.037764 0.012463 3.030 0.002
L1.y7 0.400819 0.044149 9.079 0.000
L2.y1 0.192060 0.034426 5.579 0.000
L2.y2 0.092546 0.016064 5.761 0.000
L2.y3 -0.074405 0.019337 -3.848 0.000
L2.y4 0.107889 0.018665 5.780 0.000
L2.y5 -0.076377 0.018389 -4.153 0.000
L2.y6 0.023572 0.012525 1.882 0.060
L2.y7 -0.279247 0.049095 -5.688 0.000
L3.y1 -0.081435 0.031240 -2.607 0.009
L3.y2 -0.048108 0.014782 -3.255 0.001
L3.y3 -0.099461 0.015941 -6.239 0.000
L3.y4 0.083390 0.015745 5.296 0.000
L3.y5 0.003746 0.016059 0.233 0.816
L3.y6 0.021637 0.012262 1.764 0.078
L3.y7 0.199781 0.044337 4.506 0.000
========================================================================
Results for equation y5
========================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------
const -0.007167 0.020294 -0.353 0.724
L1.y1 0.042802 0.046576 0.919 0.358
L1.y2 0.038031 0.020988 1.812 0.070
L1.y3 0.126728 0.025556 4.959 0.000
L1.y4 0.110598 0.024628 4.491 0.000
L1.y5 0.724546 0.022736 31.868 0.000
L1.y6 -0.057504 0.018269 -3.148 0.002
L1.y7 -0.166633 0.064717 -2.575 0.010
L2.y1 0.053567 0.050464 1.061 0.288
L2.y2 -0.033198 0.023547 -1.410 0.159
L2.y3 -0.009882 0.028346 -0.349 0.727
L2.y4 -0.038585 0.027361 -1.410 0.158
L2.y5 0.026103 0.026957 0.968 0.333
L2.y6 -0.051652 0.018361 -2.813 0.005
L2.y7 -0.016959 0.071968 -0.236 0.814
L3.y1 0.035306 0.045795 0.771 0.441
L3.y2 -0.031083 0.021668 -1.434 0.151
L3.y3 -0.028325 0.023368 -1.212 0.225
L3.y4 -0.013239 0.023081 -0.574 0.566
L3.y5 0.109588 0.023541 4.655 0.000
L3.y6 0.021769 0.017975 1.211 0.226
L3.y7 0.058577 0.064993 0.901 0.367
========================================================================
Results for equation y6
========================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------
const -0.019080 0.014921 -1.279 0.201
L1.y1 0.099052 0.034245 2.892 0.004
L1.y2 0.064682 0.015432 4.191 0.000
L1.y3 0.135060 0.018790 7.188 0.000
L1.y4 0.022096 0.018107 1.220 0.222
L1.y5 0.219222 0.016716 13.114 0.000
L1.y6 0.148041 0.013433 11.021 0.000
L1.y7 -0.257301 0.047583 -5.407 0.000
L2.y1 -0.170807 0.037104 -4.603 0.000
L2.y2 -0.091633 0.017313 -5.293 0.000
L2.y3 -0.127643 0.020841 -6.124 0.000
L2.y4 -0.038244 0.020117 -1.901 0.057
L2.y5 -0.070009 0.019820 -3.532 0.000
L2.y6 0.093356 0.013500 6.915 0.000
L2.y7 0.284543 0.052914 5.377 0.000
L3.y1 -0.181470 0.033671 -5.390 0.000
L3.y2 -0.054435 0.015932 -3.417 0.001
L3.y3 -0.056016 0.017181 -3.260 0.001
L3.y4 -0.027106 0.016970 -1.597 0.110
L3.y5 0.024112 0.017309 1.393 0.164
L3.y6 0.059068 0.013216 4.469 0.000
L3.y7 0.304197 0.047786 6.366 0.000
========================================================================
Results for equation y7
========================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------
const 0.304923 0.024622 12.384 0.000
L1.y1 -0.279235 0.056509 -4.941 0.000
L1.y2 -0.181726 0.025464 -7.137 0.000
L1.y3 -0.211371 0.031006 -6.817 0.000
L1.y4 -0.025242 0.029880 -0.845 0.398
L1.y5 -0.143211 0.027584 -5.192 0.000
L1.y6 0.061639 0.022166 2.781 0.005
L1.y7 1.066377 0.078519 13.581 0.000
L2.y1 -0.092224 0.061226 -1.506 0.132
L2.y2 0.028722 0.028569 1.005 0.315
L2.y3 0.011346 0.034391 0.330 0.741
L2.y4 -0.011722 0.033196 -0.353 0.724
L2.y5 0.043393 0.032706 1.327 0.185
L2.y6 0.101063 0.022276 4.537 0.000
L2.y7 0.106787 0.087316 1.223 0.221
L3.y1 -0.166293 0.055561 -2.993 0.003
L3.y2 -0.039785 0.026289 -1.513 0.130
L3.y3 -0.039566 0.028351 -1.396 0.163
L3.y4 0.002645 0.028003 0.094 0.925
L3.y5 0.068127 0.028562 2.385 0.017
L3.y6 -0.007268 0.021809 -0.333 0.739
L3.y7 0.281473 0.078854 3.570 0.000
========================================================================
Correlation matrix of residuals
y1 y2 y3 y4 y5 y6 y7
y1 1.000000 0.079552 0.498897 -0.365631 -0.753555 -0.208583 0.952331
y2 0.079552 1.000000 -0.238870 0.043716 -0.025636 -0.055213 0.264704
y3 0.498897 -0.238870 1.000000 -0.015786 -0.439831 -0.141851 0.529649
y4 -0.365631 0.043716 -0.015786 1.000000 0.373942 0.030056 -0.236243
y5 -0.753555 -0.025636 -0.439831 0.373942 1.000000 0.217675 -0.763868
y6 -0.208583 -0.055213 -0.141851 0.030056 0.217675 1.000000 -0.215717
y7 0.952331 0.264704 0.529649 -0.236243 -0.763868 -0.215717 1.000000
fc = model_fit.forecast(y=test.values, steps=3) # desnormalizar!!! prevendo 3 passos à frente para todas as variáveis.
# print(f'{fc}')
dfc = denormalize(fc, min_raw, max_raw)
# for i in range(len(dfc)):
# print(f'{dfc[i][6]}')
# Aplicando a previsão aos dados, usando rolling forecasting.
order = 1
history = [x for x in train]
predictions = []
n_prints = 3
max_prints = len(test) - 3
for t in range(len(test)):
model = VAR(train.values)
model_fit = model.fit(maxlags=order)
fc = model_fit.forecast(y=test[:t+1].values, steps=1)
output = fc[0][6]
yhat = output
predictions.append(yhat)
obs = test['ETo'].iloc[t]
history.append(obs)
if t < 3 or t >= (len(test) - 3):
print('previsão=%f, osbervado=%f' % (yhat, obs))
previsão=0.672375, osbervado=0.716667 previsão=0.437767, osbervado=0.433333 previsão=0.349767, osbervado=0.316667 ... previsão=0.639124, osbervado=0.683333 previsão=0.708566, osbervado=0.766667 previsão=0.768693, osbervado=0.850000
# Medindo o desempenho do modelo a partir das métricas: MAE, RMSE e MAPE
mae_var_int = mean_absolute_error(test['ETo'], predictions)
print('Test MAE: %.3f' % mae_var_int)
rmse_var_int = sqrt(mean_squared_error(test['ETo'], predictions))
print('Test RMSE: %.3f' % rmse_var_int)
mape_var_int = mean_absolute_percentage_error(test['ETo'], predictions)
print('Test MAPE: %.3f' % mape_var_int)
Test MAE: 0.042 Test RMSE: 0.052 Test MAPE: 0.124
fig, ax = plt.subplots()
ax.plot(df_target['ETo'].index[size:len(df_target)], test['ETo'], 'b-', label='Test')
ax.plot(df_target['ETo'].index[size:len(df_target)], predictions, 'r--', label='Predicted')
ax.legend(loc='best')
ax.set_xlabel('Year')
ax.set_ylabel('Value')
# ax.axvline(df_target['ETo'].index[size], color='#808080', linestyle='-', alpha=0.2)
# ax.axvline(df_target['ETo'].index[len(X) - 1], color='#808080', linestyle='-', alpha=0.2)
# ax.axvspan(df_target['ETo'].index[size], df_target['ETo'].index[len(df_target) - 1], color='#808080', alpha=0.2) # zona cinza
plt.title("ETo de Pará de Minas/MG")
fig.autofmt_xdate()
plt.tight_layout()
# Ilustrando graficamente o desempenho do modelo ARIMA.
fig, ax = plt.subplots()
ax.plot(df_target['ETo'].index[0:size], train['ETo'], 'g-.', label='Train')
ax.plot(df_target['ETo'].index[size:len(df_target)], test['ETo'], 'b-', label='Test')
ax.plot(df_target['ETo'].index[size:len(df_target)], predictions, 'r--', label='Predicted')
ax.legend(loc='best')
ax.set_xlabel('Year')
ax.set_ylabel('Value')
# ax.axvline(df_target['ETo'].index[size], color='#808080', linestyle='-', alpha=0.2)
# ax.axvline(df_target['ETo'].index[len(X) - 1], color='#808080', linestyle='-', alpha=0.2)
# ax.axvspan(df_target['ETo'].index[size], df_target['ETo'].index[len(X) - 1], color='#808080', alpha=0.2) # zona cinza
plt.title("ETo de Pará de Minas/MG")
fig.autofmt_xdate()
plt.tight_layout()
Aqui a ideia é usar a ETo de outras bases para a previsão multivariada.¶
origin = "datasets/clima/mg/"
geolocator = Nominatim(user_agent='geoapiFSAF')
file_names = []
for path, subdirectory, files in os.walk(origin):
for name in files:
file_names.append(name.split(sep='.csv')[0])
len(file_names)
12
latitudes = []
longitudes = []
for fn in file_names:
lat, lon = return_coordinates(fn,latitudes, longitudes)
latitudes.append(lat)
longitudes.append(lon)
-15.35, -42.25 -15.35, -44.45 -15.35, -46.65 -17.55, -42.25 -17.55, -44.45 -17.55, -46.65 -19.75, -42.25 -19.75, -44.45 -19.75, -46.65 -19.75, -48.85 -21.95, -44.45 -21.95, -46.65
all_muni = geobr.read_municipality(code_muni="MG", year=2022)
# plot
fig, ax = plt.subplots(figsize=(10, 10), dpi=200)
all_muni.plot(facecolor="#d2c1af", edgecolor="#FEBF57", ax=ax)
plt.scatter(longitudes, latitudes, marker='.', color='green', label='Bases disponíveis')
plt.scatter(-44.17, -19.46, marker='*', color='red', label='Sete Lagoas')
plt.scatter(-44.45, -17.55, marker='+', color='blue', label='Bases escolhidas')
plt.scatter(-42.25, -19.75, marker='+', color='blue')
plt.scatter(-44.45, -21.95, marker='+', color='blue')
plt.scatter(-46.65, -19.75, marker='+', color='blue')
ax.set_title("Pontos para ETo's de Interesse - Minas Gerais", fontsize=20)
ax.axis("off")
plt.legend(loc='best')
plt.show()
df1 = pd.read_csv('datasets/clima/lat-17.55_lon-44.45.csv', parse_dates=True, index_col='Unnamed: 0')
df1.head()
| Rs | u2 | Tmax | Tmin | RH | pr | ETo | |
|---|---|---|---|---|---|---|---|
| 2000-01-01 | 10.4 | 1.8 | 22.2 | 18.6 | 93.3 | 30.2 | 2.1 |
| 2000-01-02 | 10.7 | 1.4 | 22.5 | 18.3 | 91.8 | 49.0 | 2.2 |
| 2000-01-03 | 14.4 | 2.3 | 26.0 | 17.9 | 86.4 | 26.5 | 3.1 |
| 2000-01-04 | 13.4 | 1.2 | 25.7 | 18.8 | 85.7 | 16.7 | 2.9 |
| 2000-01-05 | 24.1 | 1.0 | 29.2 | 18.2 | 76.7 | 2.1 | 4.9 |
df1.plot(title='Dados de Francisco Dumont/MG')
plt.legend(loc='best')
plt.show()
df1['ETo'].plot(title='Dados de Francisco Dumont/MG')
plt.legend(loc='best')
plt.show()
df2 = pd.read_csv('datasets/clima/lat-19.75_lon-42.25.csv', parse_dates=True, index_col='Unnamed: 0')
df2.head()
| Rs | u2 | Tmax | Tmin | RH | pr | ETo | |
|---|---|---|---|---|---|---|---|
| 2000-01-01 | 10.5 | 1.4 | 25.4 | 19.2 | 91.0 | 12.1 | 2.3 |
| 2000-01-02 | 12.2 | 1.9 | 26.9 | 18.9 | 89.2 | 16.0 | 2.7 |
| 2000-01-03 | 17.0 | 1.2 | 28.5 | 18.6 | 81.8 | 16.9 | 3.6 |
| 2000-01-04 | 13.5 | 1.4 | 28.3 | 19.4 | 84.2 | 0.0 | 3.1 |
| 2000-01-05 | 24.2 | 1.0 | 30.1 | 19.6 | 80.2 | 0.0 | 4.9 |
df2.plot(title='Dados de Bom Jesus do Galho/MG')
plt.legend(loc='best')
plt.show()
df2['ETo'].plot(title='Dados de Bom Jesus do Galho/MG')
plt.legend(loc='best')
plt.show()
df3 = pd.read_csv('datasets/clima/lat-21.95_lon-44.45.csv', parse_dates=True, index_col='Unnamed: 0')
df3.head()
| Rs | u2 | Tmax | Tmin | RH | pr | ETo | |
|---|---|---|---|---|---|---|---|
| 2000-01-01 | 10.6 | 1.3 | 22.3 | 16.8 | 92.2 | 32.0 | 2.2 |
| 2000-01-02 | 10.6 | 1.9 | 20.7 | 15.9 | 93.9 | 65.6 | 2.0 |
| 2000-01-03 | 10.8 | 1.3 | 20.0 | 15.1 | 94.6 | 72.8 | 2.1 |
| 2000-01-04 | 11.5 | 0.9 | 23.4 | 15.9 | 90.2 | 49.4 | 2.4 |
| 2000-01-05 | 19.4 | 1.0 | 27.8 | 15.9 | 80.7 | 22.7 | 3.9 |
df3.plot(title='Dados de Carvalhos/MG')
plt.legend(loc='best')
plt.show()
df3['ETo'].plot(title='Dados de Carvalhos/MG')
plt.legend(loc='best')
plt.show()
df4 = pd.read_csv('datasets/clima/lat-19.75_lon-46.65.csv', parse_dates=True, index_col='Unnamed: 0')
df4.head()
| Rs | u2 | Tmax | Tmin | RH | pr | ETo | |
|---|---|---|---|---|---|---|---|
| 2000-01-01 | 11.0 | 1.3 | 24.6 | 18.1 | 93.2 | 30.3 | 2.3 |
| 2000-01-02 | 10.6 | 2.0 | 24.9 | 18.5 | 90.8 | 29.6 | 2.3 |
| 2000-01-03 | 10.5 | 1.8 | 21.6 | 17.6 | 94.9 | 35.6 | 2.1 |
| 2000-01-04 | 10.7 | 1.6 | 24.2 | 17.6 | 93.9 | 38.1 | 2.2 |
| 2000-01-05 | 11.5 | 1.2 | 25.4 | 18.1 | 90.5 | 10.4 | 2.5 |
df4.plot(title='Dados de Argenita/MG')
plt.legend(loc='best')
plt.show()
df4['ETo'].plot(title='Dados de Argenita/MG')
plt.legend(loc='best')
plt.show()
df_etos = pd.DataFrame()
df_etos['x1'] = df1['ETo']
df_etos['x2'] = df2['ETo']
df_etos['x3'] = df3['ETo']
df_etos['x4'] = df4['ETo']
df_etos['target'] = df_target['ETo']
df_etos.head()
| x1 | x2 | x3 | x4 | target | |
|---|---|---|---|---|---|
| 2000-01-01 | 2.1 | 2.3 | 2.2 | 2.3 | 2.2 |
| 2000-01-02 | 2.2 | 2.7 | 2.0 | 2.3 | 2.4 |
| 2000-01-03 | 3.1 | 3.6 | 2.1 | 2.1 | 2.6 |
| 2000-01-04 | 2.9 | 3.1 | 2.4 | 2.2 | 2.8 |
| 2000-01-05 | 4.9 | 4.9 | 3.9 | 2.5 | 4.7 |
df_etos.describe()
| x1 | x2 | x3 | x4 | target | |
|---|---|---|---|---|---|
| count | 7305.000000 | 7305.000000 | 7305.000000 | 7305.000000 | 7305.000000 |
| mean | 4.062368 | 3.533593 | 3.217180 | 3.765270 | 3.895592 |
| std | 1.145331 | 1.228129 | 1.136878 | 1.085725 | 1.165992 |
| min | 1.400000 | 1.100000 | 0.800000 | 1.000000 | 1.000000 |
| 25% | 3.100000 | 2.500000 | 2.300000 | 2.900000 | 2.900000 |
| 50% | 3.900000 | 3.300000 | 3.100000 | 3.600000 | 3.700000 |
| 75% | 4.900000 | 4.500000 | 4.000000 | 4.600000 | 4.800000 |
| max | 7.800000 | 7.400000 | 6.800000 | 6.800000 | 7.000000 |
sns.pairplot(df_etos)
<seaborn.axisgrid.PairGrid at 0x2a1ef940>
fields = ['x1','x2','x3', 'x4', 'target'] # mdct is datetime
x = df_etos[fields]
x.head(10)
| x1 | x2 | x3 | x4 | target | |
|---|---|---|---|---|---|
| 2000-01-01 | 2.1 | 2.3 | 2.2 | 2.3 | 2.2 |
| 2000-01-02 | 2.2 | 2.7 | 2.0 | 2.3 | 2.4 |
| 2000-01-03 | 3.1 | 3.6 | 2.1 | 2.1 | 2.6 |
| 2000-01-04 | 2.9 | 3.1 | 2.4 | 2.2 | 2.8 |
| 2000-01-05 | 4.9 | 4.9 | 3.9 | 2.5 | 4.7 |
| 2000-01-06 | 4.9 | 5.1 | 4.9 | 3.5 | 5.5 |
| 2000-01-07 | 5.2 | 5.3 | 5.2 | 3.8 | 5.7 |
| 2000-01-08 | 4.9 | 4.3 | 5.4 | 5.3 | 5.5 |
| 2000-01-09 | 5.4 | 2.8 | 5.9 | 5.2 | 5.5 |
| 2000-01-10 | 5.0 | 4.6 | 5.6 | 5.3 | 5.8 |
NON_DER = []
df_new = df_derived_by_shift(x, 4, NON_DER)
df_new.head()
| x1 | x2 | x3 | x4 | target | x1_1 | x1_2 | x1_3 | x1_4 | x2_1 | ... | x3_3 | x3_4 | x4_1 | x4_2 | x4_3 | x4_4 | target_1 | target_2 | target_3 | target_4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2000-01-01 | 2.1 | 2.3 | 2.2 | 2.3 | 2.2 | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2000-01-02 | 2.2 | 2.7 | 2.0 | 2.3 | 2.4 | 2.1 | NaN | NaN | NaN | 2.3 | ... | NaN | NaN | 2.3 | NaN | NaN | NaN | 2.2 | NaN | NaN | NaN |
| 2000-01-03 | 3.1 | 3.6 | 2.1 | 2.1 | 2.6 | 2.2 | 2.1 | NaN | NaN | 2.7 | ... | NaN | NaN | 2.3 | 2.3 | NaN | NaN | 2.4 | 2.2 | NaN | NaN |
| 2000-01-04 | 2.9 | 3.1 | 2.4 | 2.2 | 2.8 | 3.1 | 2.2 | 2.1 | NaN | 3.6 | ... | 2.2 | NaN | 2.1 | 2.3 | 2.3 | NaN | 2.6 | 2.4 | 2.2 | NaN |
| 2000-01-05 | 4.9 | 4.9 | 3.9 | 2.5 | 4.7 | 2.9 | 3.1 | 2.2 | 2.1 | 3.1 | ... | 2.0 | 2.2 | 2.2 | 2.1 | 2.3 | 2.3 | 2.8 | 2.6 | 2.4 | 2.2 |
5 rows × 25 columns
df_new = df_new.dropna()
df_new.head()
| x1 | x2 | x3 | x4 | target | x1_1 | x1_2 | x1_3 | x1_4 | x2_1 | ... | x3_3 | x3_4 | x4_1 | x4_2 | x4_3 | x4_4 | target_1 | target_2 | target_3 | target_4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2000-01-05 | 4.9 | 4.9 | 3.9 | 2.5 | 4.7 | 2.9 | 3.1 | 2.2 | 2.1 | 3.1 | ... | 2.0 | 2.2 | 2.2 | 2.1 | 2.3 | 2.3 | 2.8 | 2.6 | 2.4 | 2.2 |
| 2000-01-06 | 4.9 | 5.1 | 4.9 | 3.5 | 5.5 | 4.9 | 2.9 | 3.1 | 2.2 | 4.9 | ... | 2.1 | 2.0 | 2.5 | 2.2 | 2.1 | 2.3 | 4.7 | 2.8 | 2.6 | 2.4 |
| 2000-01-07 | 5.2 | 5.3 | 5.2 | 3.8 | 5.7 | 4.9 | 4.9 | 2.9 | 3.1 | 5.1 | ... | 2.4 | 2.1 | 3.5 | 2.5 | 2.2 | 2.1 | 5.5 | 4.7 | 2.8 | 2.6 |
| 2000-01-08 | 4.9 | 4.3 | 5.4 | 5.3 | 5.5 | 5.2 | 4.9 | 4.9 | 2.9 | 5.3 | ... | 3.9 | 2.4 | 3.8 | 3.5 | 2.5 | 2.2 | 5.7 | 5.5 | 4.7 | 2.8 |
| 2000-01-09 | 5.4 | 2.8 | 5.9 | 5.2 | 5.5 | 4.9 | 5.2 | 4.9 | 4.9 | 4.3 | ... | 4.9 | 3.9 | 5.3 | 3.8 | 3.5 | 2.5 | 5.5 | 5.7 | 5.5 | 4.7 |
5 rows × 25 columns
df_new.corr()
| x1 | x2 | x3 | x4 | target | x1_1 | x1_2 | x1_3 | x1_4 | x2_1 | ... | x3_3 | x3_4 | x4_1 | x4_2 | x4_3 | x4_4 | target_1 | target_2 | target_3 | target_4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| x1 | 1.000000 | 0.779660 | 0.572394 | 0.673563 | 0.818432 | 0.808354 | 0.688298 | 0.604119 | 0.551753 | 0.701051 | ... | 0.480812 | 0.440096 | 0.658057 | 0.582857 | 0.513447 | 0.461159 | 0.748016 | 0.645322 | 0.562314 | 0.507240 |
| x2 | 0.779660 | 1.000000 | 0.700405 | 0.640360 | 0.811726 | 0.677007 | 0.582250 | 0.522083 | 0.488851 | 0.784082 | ... | 0.569185 | 0.522505 | 0.642308 | 0.565833 | 0.492845 | 0.448007 | 0.733317 | 0.630401 | 0.551571 | 0.502079 |
| x3 | 0.572394 | 0.700405 | 1.000000 | 0.793822 | 0.774858 | 0.497210 | 0.440292 | 0.414338 | 0.409969 | 0.582797 | ... | 0.548417 | 0.515298 | 0.656966 | 0.525570 | 0.453505 | 0.416872 | 0.633741 | 0.530960 | 0.476539 | 0.452356 |
| x4 | 0.673563 | 0.640360 | 0.793822 | 1.000000 | 0.844529 | 0.570686 | 0.501539 | 0.457320 | 0.437959 | 0.543532 | ... | 0.452001 | 0.411746 | 0.739909 | 0.569562 | 0.476092 | 0.429567 | 0.674316 | 0.544038 | 0.476691 | 0.442275 |
| target | 0.818432 | 0.811726 | 0.774858 | 0.844529 | 1.000000 | 0.689353 | 0.587744 | 0.526335 | 0.495982 | 0.685596 | ... | 0.516355 | 0.469410 | 0.739108 | 0.595171 | 0.503852 | 0.449736 | 0.788732 | 0.635080 | 0.544451 | 0.496099 |
| x1_1 | 0.808354 | 0.677007 | 0.497210 | 0.570686 | 0.689353 | 1.000000 | 0.808304 | 0.688223 | 0.604109 | 0.779600 | ... | 0.536941 | 0.480677 | 0.673517 | 0.657984 | 0.582808 | 0.513344 | 0.818379 | 0.747972 | 0.645316 | 0.562307 |
| x1_2 | 0.688298 | 0.582250 | 0.440292 | 0.501539 | 0.587744 | 0.808304 | 1.000000 | 0.808232 | 0.688187 | 0.676932 | ... | 0.592017 | 0.536820 | 0.570611 | 0.673431 | 0.657929 | 0.582708 | 0.689261 | 0.818340 | 0.747959 | 0.645293 |
| x1_3 | 0.604119 | 0.522083 | 0.414338 | 0.457320 | 0.526335 | 0.688223 | 0.808232 | 1.000000 | 0.808281 | 0.582110 | ... | 0.572020 | 0.591912 | 0.501531 | 0.570581 | 0.673430 | 0.657888 | 0.587621 | 0.689219 | 0.818371 | 0.748004 |
| x1_4 | 0.551753 | 0.488851 | 0.409969 | 0.437959 | 0.495982 | 0.604109 | 0.688187 | 0.808281 | 1.000000 | 0.521972 | ... | 0.496922 | 0.571991 | 0.457391 | 0.501598 | 0.570630 | 0.673463 | 0.526309 | 0.587631 | 0.689256 | 0.818402 |
| x2_1 | 0.701051 | 0.784082 | 0.582797 | 0.543532 | 0.685596 | 0.779600 | 0.676932 | 0.582110 | 0.521972 | 1.000000 | ... | 0.642084 | 0.569084 | 0.640252 | 0.642185 | 0.565730 | 0.492713 | 0.811684 | 0.733242 | 0.630326 | 0.551481 |
| x2_2 | 0.605433 | 0.650588 | 0.510284 | 0.474396 | 0.583125 | 0.700914 | 0.779500 | 0.676709 | 0.581903 | 0.784028 | ... | 0.721210 | 0.641944 | 0.543332 | 0.640039 | 0.642013 | 0.565527 | 0.685461 | 0.811561 | 0.733105 | 0.630159 |
| x2_3 | 0.539449 | 0.578971 | 0.487843 | 0.441044 | 0.526357 | 0.605346 | 0.700850 | 0.779421 | 0.676649 | 0.650519 | ... | 0.700134 | 0.721150 | 0.474310 | 0.543231 | 0.639976 | 0.641930 | 0.583023 | 0.685400 | 0.811532 | 0.733065 |
| x2_4 | 0.494432 | 0.543735 | 0.478477 | 0.420029 | 0.495066 | 0.539432 | 0.605326 | 0.700862 | 0.779416 | 0.578911 | ... | 0.582550 | 0.700130 | 0.441052 | 0.474314 | 0.543238 | 0.639982 | 0.526334 | 0.583017 | 0.685410 | 0.811534 |
| x3_1 | 0.592259 | 0.721416 | 0.786957 | 0.666275 | 0.717551 | 0.572260 | 0.497059 | 0.440119 | 0.414270 | 0.700338 | ... | 0.630130 | 0.548277 | 0.793759 | 0.656850 | 0.525476 | 0.453345 | 0.774776 | 0.633650 | 0.530910 | 0.476484 |
| x3_2 | 0.537084 | 0.642176 | 0.630271 | 0.527061 | 0.595080 | 0.592142 | 0.572138 | 0.496930 | 0.440097 | 0.721346 | ... | 0.786805 | 0.630022 | 0.666220 | 0.793698 | 0.656804 | 0.525356 | 0.717453 | 0.774734 | 0.633642 | 0.530896 |
| x3_3 | 0.480812 | 0.569185 | 0.548417 | 0.452001 | 0.516355 | 0.536941 | 0.592017 | 0.572020 | 0.496922 | 0.642084 | ... | 1.000000 | 0.786741 | 0.526985 | 0.666135 | 0.793686 | 0.656713 | 0.594925 | 0.717400 | 0.774760 | 0.633651 |
| x3_4 | 0.440096 | 0.522505 | 0.515298 | 0.411746 | 0.469410 | 0.480677 | 0.536820 | 0.591912 | 0.571991 | 0.569084 | ... | 0.786741 | 1.000000 | 0.451915 | 0.526878 | 0.666086 | 0.793629 | 0.516205 | 0.594845 | 0.717390 | 0.774751 |
| x4_1 | 0.658057 | 0.642308 | 0.656966 | 0.739909 | 0.739108 | 0.673517 | 0.570611 | 0.501531 | 0.457391 | 0.640252 | ... | 0.526985 | 0.451915 | 1.000000 | 0.739906 | 0.569567 | 0.476063 | 0.844499 | 0.674290 | 0.544067 | 0.476741 |
| x4_2 | 0.582857 | 0.565833 | 0.525570 | 0.569562 | 0.595171 | 0.657984 | 0.673431 | 0.570581 | 0.501598 | 0.642185 | ... | 0.666135 | 0.526878 | 0.739906 | 1.000000 | 0.739905 | 0.569519 | 0.739029 | 0.844476 | 0.674317 | 0.544113 |
| x4_3 | 0.513447 | 0.492845 | 0.453505 | 0.476092 | 0.503852 | 0.582808 | 0.657929 | 0.673430 | 0.570630 | 0.565730 | ... | 0.793686 | 0.666086 | 0.569567 | 0.739905 | 1.000000 | 0.739898 | 0.595109 | 0.739012 | 0.844488 | 0.674347 |
| x4_4 | 0.461159 | 0.448007 | 0.416872 | 0.429567 | 0.449736 | 0.513344 | 0.582708 | 0.657888 | 0.673463 | 0.492713 | ... | 0.656713 | 0.793629 | 0.476063 | 0.569519 | 0.739898 | 1.000000 | 0.503717 | 0.595055 | 0.739033 | 0.844517 |
| target_1 | 0.748016 | 0.733317 | 0.633741 | 0.674316 | 0.788732 | 0.818379 | 0.689261 | 0.587621 | 0.526309 | 0.811684 | ... | 0.594925 | 0.516205 | 0.844499 | 0.739029 | 0.595109 | 0.503717 | 1.000000 | 0.788691 | 0.635069 | 0.544432 |
| target_2 | 0.645322 | 0.630401 | 0.530960 | 0.544038 | 0.635080 | 0.747972 | 0.818340 | 0.689219 | 0.587631 | 0.733242 | ... | 0.717400 | 0.594845 | 0.674290 | 0.844476 | 0.739012 | 0.595055 | 0.788691 | 1.000000 | 0.788697 | 0.635077 |
| target_3 | 0.562314 | 0.551571 | 0.476539 | 0.476691 | 0.544451 | 0.645316 | 0.747959 | 0.818371 | 0.689256 | 0.630326 | ... | 0.774760 | 0.717390 | 0.544067 | 0.674317 | 0.844488 | 0.739033 | 0.635069 | 0.788697 | 1.000000 | 0.788718 |
| target_4 | 0.507240 | 0.502079 | 0.452356 | 0.442275 | 0.496099 | 0.562307 | 0.645293 | 0.748004 | 0.818402 | 0.551481 | ... | 0.633651 | 0.774751 | 0.476741 | 0.544113 | 0.674347 | 0.844517 | 0.544432 | 0.635077 | 0.788718 | 1.000000 |
25 rows × 25 columns
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
colormap = plt.cm.RdBu
plt.figure(figsize=(15,10))
plt.title(u'4 days', y=1.05, size=16)
mask = np.zeros_like(df_new.corr())
mask[np.triu_indices_from(mask)] = True
svm = sns.heatmap(df_new.corr(), mask=mask, linewidths=0.1, vmax=1.0,
square=True, cmap=colormap, linecolor='white', annot=True)
# Normalize Data
# Save Min-Max for Denorm
min_raw = df_etos.min()
max_raw = df_etos.max()
# Perform Normalization
norm_df = normalize(df_etos)
df = df_etos
# Separando a base de dados em treinamento e teste, na proporção de 80% para 20%.
# X = df_target['ETo'].values
size = int(len(norm_df) * 0.80)
train, test = norm_df[0:size], norm_df[size:len(norm_df)]
# Modelo VAR.
order = 3
model = VAR(train.values)
model_fit = model.fit(maxlags=order)
lag_order = model_fit.k_ar
print(f'lag_order = {lag_order}\n')
print(model_fit.summary())
lag_order = 3
Summary of Regression Results
==================================
Model: VAR
Method: OLS
Date: Mon, 11, Dec, 2023
Time: 22:35:40
--------------------------------------------------------------------
No. of Equations: 5.00000 BIC: -24.0400
Nobs: 5841.00 HQIC: -24.0996
Log likelihood: 29115.6 FPE: 3.31034e-11
AIC: -24.1314 Det(Omega_mle): 3.26537e-11
--------------------------------------------------------------------
Results for equation y1
========================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------
const 0.038429 0.003985 9.643 0.000
L1.y1 0.497902 0.016742 29.740 0.000
L1.y2 0.126919 0.014987 8.469 0.000
L1.y3 0.124682 0.015239 8.182 0.000
L1.y4 0.103962 0.015938 6.523 0.000
L1.y5 0.086291 0.019306 4.470 0.000
L2.y1 0.091085 0.018063 5.043 0.000
L2.y2 -0.030933 0.015971 -1.937 0.053
L2.y3 -0.083846 0.017289 -4.850 0.000
L2.y4 -0.021265 0.016860 -1.261 0.207
L2.y5 -0.039503 0.019973 -1.978 0.048
L3.y1 0.087351 0.016260 5.372 0.000
L3.y2 -0.004889 0.014590 -0.335 0.738
L3.y3 -0.021440 0.016006 -1.340 0.180
L3.y4 0.034687 0.016087 2.156 0.031
L3.y5 -0.055626 0.019244 -2.891 0.004
========================================================================
Results for equation y2
========================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------
const 0.017179 0.004346 3.953 0.000
L1.y1 0.148346 0.018257 8.126 0.000
L1.y2 0.423960 0.016343 25.941 0.000
L1.y3 0.346178 0.016618 20.831 0.000
L1.y4 0.046377 0.017380 2.668 0.008
L1.y5 0.014610 0.021053 0.694 0.488
L2.y1 -0.029502 0.019698 -1.498 0.134
L2.y2 0.028271 0.017416 1.623 0.105
L2.y3 -0.049274 0.018854 -2.614 0.009
L2.y4 -0.058452 0.018386 -3.179 0.001
L2.y5 -0.006885 0.021781 -0.316 0.752
L3.y1 0.034696 0.017732 1.957 0.050
L3.y2 0.088317 0.015910 5.551 0.000
L3.y3 0.009865 0.017454 0.565 0.572
L3.y4 -0.012436 0.017543 -0.709 0.478
L3.y5 -0.043133 0.020985 -2.055 0.040
========================================================================
Results for equation y3
========================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------
const 0.060087 0.004596 13.075 0.000
L1.y1 0.046026 0.019306 2.384 0.017
L1.y2 0.004992 0.017282 0.289 0.773
L1.y3 0.672639 0.017573 38.276 0.000
L1.y4 0.147630 0.018379 8.033 0.000
L1.y5 -0.051329 0.022263 -2.306 0.021
L2.y1 -0.025661 0.020830 -1.232 0.218
L2.y2 -0.013107 0.018417 -0.712 0.477
L2.y3 -0.002259 0.019937 -0.113 0.910
L2.y4 -0.091614 0.019442 -4.712 0.000
L2.y5 0.007301 0.023032 0.317 0.751
L3.y1 0.018079 0.018751 0.964 0.335
L3.y2 0.120310 0.016824 7.151 0.000
L3.y3 0.066509 0.018457 3.603 0.000
L3.y4 -0.020918 0.018551 -1.128 0.259
L3.y5 -0.016818 0.022192 -0.758 0.449
========================================================================
Results for equation y4
========================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------
const 0.097556 0.004904 19.892 0.000
L1.y1 0.078847 0.020603 3.827 0.000
L1.y2 -0.022295 0.018443 -1.209 0.227
L1.y3 0.275957 0.018754 14.715 0.000
L1.y4 0.494779 0.019614 25.226 0.000
L1.y5 0.025223 0.023759 1.062 0.288
L2.y1 0.035668 0.022229 1.605 0.109
L2.y2 0.004500 0.019654 0.229 0.819
L2.y3 -0.130471 0.021277 -6.132 0.000
L2.y4 0.024574 0.020748 1.184 0.236
L2.y5 -0.046558 0.024580 -1.894 0.058
L3.y1 0.045880 0.020011 2.293 0.022
L3.y2 0.046761 0.017954 2.604 0.009
L3.y3 -0.007318 0.019697 -0.372 0.710
L3.y4 0.042938 0.019797 2.169 0.030
L3.y5 -0.024864 0.023682 -1.050 0.294
========================================================================
Results for equation y5
========================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------
const 0.072899 0.004437 16.430 0.000
L1.y1 0.155153 0.018640 8.324 0.000
L1.y2 0.074389 0.016686 4.458 0.000
L1.y3 0.295725 0.016967 17.429 0.000
L1.y4 0.204022 0.017745 11.498 0.000
L1.y5 0.301184 0.021495 14.012 0.000
L2.y1 -0.001423 0.020111 -0.071 0.944
L2.y2 0.003697 0.017782 0.208 0.835
L2.y3 -0.116327 0.019249 -6.043 0.000
L2.y4 -0.068600 0.018772 -3.654 0.000
L2.y5 0.009651 0.022238 0.434 0.664
L3.y1 0.048895 0.018104 2.701 0.007
L3.y2 0.042151 0.016244 2.595 0.009
L3.y3 0.000694 0.017820 0.039 0.969
L3.y4 0.000697 0.017911 0.039 0.969
L3.y5 -0.019193 0.021426 -0.896 0.370
========================================================================
Correlation matrix of residuals
y1 y2 y3 y4 y5
y1 1.000000 0.499386 0.209582 0.341164 0.556666
y2 0.499386 1.000000 0.316625 0.255946 0.517417
y3 0.209582 0.316625 1.000000 0.628955 0.550458
y4 0.341164 0.255946 0.628955 1.000000 0.662607
y5 0.556666 0.517417 0.550458 0.662607 1.000000
fc = model_fit.forecast(y=test.values, steps=3) # desnormalizar!!! prevendo 3 passos à frente para todas as variáveis.
# print(f'{fc}')
dfc = denormalize(fc, min_raw, max_raw)
# Aplicando a previsão aos dados, usando rolling forecasting.
order = 1
history = [x for x in train]
predictions = []
for t in range(len(test)):
model = VAR(train.values)
model_fit = model.fit(maxlags=order)
fc = model_fit.forecast(y=test[:t+1].values, steps=1)
output = fc[0][4]
yhat = output
predictions.append(yhat)
obs = test['target'].iloc[t]
history.append(obs)
if t < 3 or t >= (len(test) - 3):
print('previsão=%f, osbervado=%f' % (yhat, obs))
previsão=0.731906, osbervado=0.716667 previsão=0.532807, osbervado=0.433333 previsão=0.368967, osbervado=0.316667 ... previsão=0.743142, osbervado=0.683333 previsão=0.764702, osbervado=0.766667 previsão=0.777312, osbervado=0.850000
# Medindo o desempenho do modelo a partir das métricas: MAE, RMSE e MAPE
mae_var_ext = mean_absolute_error(test['target'], predictions)
print('Test MAE: %.3f' % mae_var_ext)
rmse_var_ext = sqrt(mean_squared_error(test['target'], predictions))
print('Test RMSE: %.3f' % rmse_var_ext)
mape_var_ext = mean_absolute_percentage_error(test['target'], predictions)
print('Test MAPE: %.3f' % mape_var_ext)
Test MAE: 0.052 Test RMSE: 0.067 Test MAPE: 0.144
fig, ax = plt.subplots()
ax.plot(df_target['ETo'].index[size:len(df_target)], test['target'], 'b-', label='Test')
ax.plot(df_target['ETo'].index[size:len(df_target)], predictions, 'r--', label='Predicted')
ax.legend(loc='best')
ax.set_xlabel('Year')
ax.set_ylabel('Value')
# ax.axvline(df_target['ETo'].index[size], color='#808080', linestyle='-', alpha=0.2)
# ax.axvline(df_target['ETo'].index[len(X) - 1], color='#808080', linestyle='-', alpha=0.2)
# ax.axvspan(df_target['ETo'].index[size], df_target['ETo'].index[len(df_target) - 1], color='#808080', alpha=0.2) # zona cinza
plt.title("ETo de Pará de Minas/MG")
fig.autofmt_xdate()
plt.tight_layout()
# Ilustrando graficamente o desempenho do modelo ARIMA.
fig, ax = plt.subplots()
ax.plot(df_target['ETo'].index[0:size], train['target'], 'g-.', label='Train')
ax.plot(df_target['ETo'].index[size:len(df_target)], test['target'], 'b-', label='Test')
ax.plot(df_target['ETo'].index[size:len(df_target)], predictions, 'r--', label='Predicted')
ax.legend(loc='best')
ax.set_xlabel('Year')
ax.set_ylabel('Value')
# ax.axvline(df_target['ETo'].index[size], color='#808080', linestyle='-', alpha=0.2)
# ax.axvline(df_target['ETo'].index[len(X) - 1], color='#808080', linestyle='-', alpha=0.2)
# ax.axvspan(df_target['ETo'].index[size], df_target['ETo'].index[len(X) - 1], color='#808080', alpha=0.2) # zona cinza
ax.axvspan(df_target['ETo'].index[size], df_target['ETo'].index[len(df_target) - 1], color='#808080', alpha=0.2) # zona cinza
plt.title("ETo de Pará de Minas/MG")
fig.autofmt_xdate()
plt.tight_layout()
metrics_names = ['mae_ar', 'mae_var_int', 'mae_var_ext', 'rmse_ar', 'rmse_var_int', 'rmse_var_ext', 'mape_ar', 'mape_var_int', 'mape_var_ext']
metrics_values = [mae_ar, mae_var_int, mae_var_ext, rmse_ar, rmse_var_int, rmse_var_ext, mape_ar, mape_var_int, mape_var_ext]
colors = ['tab:red', 'tab:blue', 'tab:orange']
bar_labels = ['AR ETo Principal', 'VAR Parâmetros Meteorológicos ETo Principal', 'VAR ETos em torno da ETo Principal', '_AR ETo Principal', '_VAR Parâmetros Meteorológicos ETo Principal', '_VAR ETos em torno da ETo Principal', '_AR ETo Principal', '_VAR Parâmetros Meteorológicos ETo Principal', '_VAR ETos em torno da ETo Principal']
fig, ax = plt.subplots()
bar_container = ax.bar(metrics_names, metrics_values, color=colors, label=bar_labels)
ax.set(ylabel='Values', xlabel='Metrics', title='Assessment Results') #, ylim=(0, 100))
ax.bar_label(bar_container, fmt='{:,.2f}')
ax.legend(title='Models')
<matplotlib.legend.Legend at 0x2c067b27e50>
Considerações: Observa-se, em todas as métricas utilizadas, que o modelo de previsão multivariado, VAR, que utiliza as variáveis "internas" (Rs, u2, Tmax, Tmin, RH, pr) da base principal (Pará de Minas/MG) para prever a ETo teve o melhor desempenho. O modelo multivariado que utiliza as séries de ETo das outras bases selecionadas (Francisco Dumont/MG, Bom Jesus do Galho/MG, Carvalhos/MG e Argenita/MG), para prever a ETo da base principal (Pará de Minas/MG), teve desempenho ligeiramente inferior. O modelo univariado, AR, de ordem 1, foi o que apresentou pior desempenho.